The E ectiveness of Corpus - Induced Dependency Grammars forPost - processing Speech
نویسندگان
چکیده
This paper investigates the impact of Constraint Dependency Grammars (CDG) on the accuracy of an integrated speech recognition and CDG parsing system. We compare a conventional CDG with CDGs that are induced from annotated sentences and template-expanded sentences. The grammars are evaluated on parsing speed, precision/coverage, and improvement of word and sentence accuracy of the integrated system. Sentence-derived CDGs sig-niicantly improve recognition accuracy over the conventional CDG but are less general. Expanding the sentences with templates provides us with a mechanism for increasing the coverage of the grammar with only minor reductions in recognition accuracy. 1 Background The question of when and how to integrate language models with speech recognition systems is gaining in importance as recognition tasks investigated by the speech community become increasingly more challenging and as speech recognizers are used in hu-man/computer interfaces and dialog systems Many systems tightly integrate N-gram stochastic language models, with a power limited to a regular grammar, to build more accurate speech recognizers. However, in order to act based on the spoken interaction with the user, the speech signal must be mapped to an internal representation. Obtaining a syntactic representation for the spoken utterance has a high degree of utility for mapping to a semantic representation. Without a structural analysis of an input, it is diicult to guarantee the correctness of the mapping from a sentence to its interpretation (e.g., mathematical expressions to internal calculations). We believe that signiicant additional improvement in accuracy can be gained in speciic domains by using a more complex language model that combines syntactic, semantic, and domain knowledge. A language processing module that is more powerful than a regular grammar can be loosely, moderately , or tightly integrated with the spoken language system, and there are advantages and disadvantages associated with each choice (Harper et al., 1994). To tightly integrate a language model with the power of a context-free grammar with the acoustic module requires that the power of the two modules be matched, making the integrated system fairly intractable and diicult to train. By separating the language model from the acoustic model, it becomes possible to use a more powerful language model without increasing computational costs or the amount of acoustic training data required by the rec-ognizer. Furthermore, a loosely-integrated language model can be developed independently of the speech recognition component, which is clearly an advantage. Decoupling the acoustic and language models also adds exibility: a …
منابع مشابه
The Effectiveness of Corpus-Induced Dependency Grammars for Post-processing Speech
This paper investigates the impact of Constraint Dependency Grammars (CDG) on the accuracy of an integrated speech recognition and CDG parsing system. We compare a conventional CDG with CDGs that are induced from annotated sentences and template-expanded sentences. The grammars are evaluated on parsing speed, precision/coverage, and improvement of word and sentence accuracy of the integrated sy...
متن کاملLearning Probabilistic Dependency Grammars from Labeled Text
We present the results of experimenting with schemes for learning probabilistic dependency grammars1 for English from corpora labelled with part-of-speech information. We intend our system to produce widecoverage grammars which have some resemblance to the standard 2 context-free grammars of English which grammarians and linguists commonly exhibit as exampies.
متن کاملThings between Lexicon and Grammar
A number of grammar formalisms were proposed in 80’s, such as Lexical Functional Grammars, Generalized Phrase Structure Grammars, and Tree Adjoining Grammars. Those formalisms then started to put a stress on lexicon, and were called as lexicalist (or lexicalized) grammars. Representative examples of lexicalist grammars were Head-driven Phrase Structure Grammars (HPSG) and Lexicalized Tree Adjoi...
متن کاملTwo Experiments on Learning Probabilistic Dependency Grammars from Corpora
Introduction We present a scheme for learning prohabilistic dependency grammars from positive training examples plus constraints on rules. In particular we present the results of two experiments. The first, in which the constraints were minimal, was unsuccessful. The second, with significant constraints, was successful within the bounds of the task we had set. We will explicate dependency gramm...
متن کاملOn the use of probabilistic grammars in speech annotation and segmentation tasks
The present paper explores the issue of corpus prosodic parsing in terms of prosodic words. This question is of importance in both speech processing and corpus annotation studies. We propose a method grounded on both statistical and symbolic (phonological) representations of tonal phenomena and we have recourse to probabilistic grammars, whithin which we implement a minimal prosodic hierarchica...
متن کامل